Picture for Helin Wang

Helin Wang

SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation

Add code
Jan 27, 2026
Viaarxiv icon

Summary of The Inaugural Music Source Restoration Challenge

Add code
Jan 07, 2026
Viaarxiv icon

SAM Audio: Segment Anything in Audio

Add code
Dec 19, 2025
Figure 1 for SAM Audio: Segment Anything in Audio
Figure 2 for SAM Audio: Segment Anything in Audio
Figure 3 for SAM Audio: Segment Anything in Audio
Figure 4 for SAM Audio: Segment Anything in Audio
Viaarxiv icon

Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization

Add code
Dec 17, 2025
Figure 1 for Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
Figure 2 for Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
Figure 3 for Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
Figure 4 for Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
Viaarxiv icon

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

Add code
May 25, 2025
Viaarxiv icon

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits

Add code
May 20, 2025
Viaarxiv icon

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

Add code
Jan 27, 2025
Viaarxiv icon

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Add code
Sep 17, 2024
Viaarxiv icon

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Add code
Sep 12, 2024
Viaarxiv icon

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

Add code
Sep 11, 2024
Figure 1 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 2 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 3 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 4 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Viaarxiv icon